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The  Equivalence  Interval  as  a Measure  of  Uncertainty1 

The  concept  of  uncertainty  has  become  increasingly  important  for  under- 
standing the  decisions  people  make  in  a wide  variety  of  situations:  from 
organizational  level  decisions  that  have  significant  Impact  upon  an  organi- 
zation's effectiveness  (e.g.  Downey  & Slocum,  1975;  Lawrence  £ Lorsch,  1967; 
Thompson,  1967)  to  fairly  mundane  sorts  of  individual  level  decisions,  such 
as  selecting  among  bets  (e.g.  Ellsberg,  1961;  Raiffa,  1961)  and  employment 
conditions  (e.g.  Larson,  Note  1;  Larson  a Mitchell,  1977)  In  contrived 
experimental  settings.  Yet,  relatively  little  empirical  work  has  been  done 
to  delineate  either  the  personal  or  situational  determinants  of  uncertainty. 
Moreover,  the  research  that  has  been  done  has  relied  on  criterion  measures  of 
uncertainty  that  no  only  are  applicable  In  just  a few  settings,  but  that 
also  tend  to  have  rather  low  reliability  and  validity  (e.g.  Downey  & Slocum, 
1975;  Downey,  Hellriegel  & Slocum,  1975).  The  primary  purpose  of  the  present 
study  was  to  investigate  the  usefulness  of  a new  measure  of  uncertainty 
which  may  be  applied  In  a wide  variety  of  settings. 

Uncertainty  can  be  defined  as  a subjective  state  in  which  individuals 
feel  unable  to  make  precise  judgments  about  some  characteristic  of  a given 
entity,  situation,  relationship,  or  event  (Larson  & Mitchell,  1977).  The 
less  precise  the  judgments  the  more  uncertain  the  individuals  are  about  the 
characteristic  in  question.  Defined  in  this  manner,  uncertainty  is  closely 
related  to  confidence  in  the  accuracy  of  a judgment:  As  uncertainty  increases, 
confidence  In  the  accuracy  of  a judgment  should  decrease. 

Using  this  definition.  It  seems  reasonable  to  measure  uncertainty  by 
structuring  the  judgment  task  to  allow  individuals  to  respond  In  either  more 
or  less  precise  terms.  A measurement  technique  used  to  Investigate  ranges 
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of  subjectively  acceptable  error  is  well  adapted  to  this  purpose  (e.g. 

Beach  et  al.,  1974;  Beach  & Solak,  1969;  Laestadlus,  1970).  Respondents  are 
first  asked  to  make  judgments  about  some  quantitation  characteristic  of  a 
given  stimulus  (e.g.  weight*  size,  net  earnings,  etc.).  Then,  supposing 
that  their  answers  are  not  exactly  correct,  they  are  asked  to  go  back  and 
indicate  the  range  of  possible  values  of  the  characteristic  in  question  that 
the  stimulus  could  have  and  still  leave  them  confident  that  their  original 
judgment  was  essentially  correct,  or  "In  the  ballpark."  This  range  of  values 
is  termed  an  equivalence  Interval,  since  It  Is  assumed  that  all  of  the  values 
falling  within  it  are  perceived  by  the  respondents  as  essentially  equivalent 
to  their  initial  judgment  In  terms  of  accuracy.  With  regard  to  the  present 
discussion,  the  size  of  the  equivalence  Interval  can  be  taken  as  a measure 
of  uncertainty:  The  more  uncertain  Individuals  are  about  the  correctness  of 
their  judgments  the  larger  should  be  the  size  of  their  equivalence  Intervals. 
Therefore,  ft  was  hypothesized  that  the  width  of  the  equivalence  Interval 
will  be  highly  correlated  with  a rating  of  confidence  In  the  accuracy  of  a 
judgment. 

A secondary  purpose  of  the  present  study  was  to  Investigate  the  effect 
of  having  readily  available  standards  of  comparison  on  judgment  uncertainty. 
The  judgment  process  Is  by  its  very  nature  a comparative  one  In  which  the 
characteristic  to  be  judged  Is  compared  to  some  known  standard  or  anchor 
point  (c.f.  Stevens,  1966).  It  was  hypothesized  that  as  the  actual  value  of 
the  characteristic  to  be  judged  approaches  a clearly  defined  anchor  point, 
judgments  about  that  value  will  become  easier  to  make,  and  Individuals  will 
therefore  tend  to  be  less  uncertain  about  their  accuracy.  Conversely, 
individuals  will  in  general  be  more  uncertain  about  the  accuracy  of  Such 
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judgments  the  further  the  actual  value  of  the  characteristic  In  question  Is 
from  a clearly  defined  anchor  point.  This  hypothesis  was  tested  by  assessing 
subjects'  uncertainty  about  judgments  Involving  stimuli  that  were  either  near 
to  or  far  from  a clearly  defined  anchor  point. 

Method 

Overview 

Subjects  were  asked  to  make  a series  of  15  judgments,  one  about  each  of 
five  stimuli  In  three  different  stimulus  classes.  Within  each  class  the 
stimuli  varied  in  the  extent  to  which  they  were  near  to  or  far  from  either 
the  maximum  or  minimum  possible  value  of  that  stimulus.  Steps  were  taken  to 
establish  the  maximum  and  minimum  possible  values  of  each  stimulus  class  as 
clearly  defined  anchor  points.  After  making  each  judgment  the  subjects  were 
asked  to  use  one  of  two  methods  to  Indicate  how  uncertain  they  were  about 
the  accuracy  of  that  judgment.  Half  of  the  subjects  used  a separate  bi-polar 
rating  scale  to  indicate  their  uncertainty,  while  the  other  half  constructed 
equivalence  Intervals. 

Subjects 

Sixty  undergraduate  students  enrolled  In  lower  division  psychology 
courses  at  the  University  of  Washington  participated  In  the  study  one  at  a 
time.  They  each  received  one  half  hour  of  experimental  credit  for  partici- 
pating. 

Judgment  Task  ' 

The  subjects  were  required  to  make  five  separate  judgments  of  fullness, 
numeroclty,  and  time.  A different  type  of  stimulus  was  used  for  each  type  of 
judgment. 

Fullness.  The  first  set  of  stimuli  consisted  of  five  small  sealed 
opaque  paper  cartons  of  uniform  size  and  shape.  Each  carton  held  a different 
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number  of  marbles.  The  subjects  were  required  to  examine  each  carton  and 
estimate  the  number  of  marbles  it  held.  The  cartons  held  6,  24,  42,  60,  and 
78  marbles  respectively.  The  subjects  were  informed  that  a maximum  of  85 
marbles  could  be  fit  into  any  one  carton.  They  were  asked  to  indicate  their 
best  guess  about  the  exact  number  of  marbles  in  each  carton  by  placing  an  "X" 
at  the  appropriate  point  on  a numberline  marked  with  86  points,  from  0 to  85. 

Numerocity.  The  second  set  of  stimuli  consisted  of  five  slides  of  100 
red  and  blue  disks  intermixed  randomly  in  a 10  x 10  matrix.  The  slides  were 
presented  tachlstoscopically  one  at  a time  for  approximately  .25  seconds. 

Each  slide  showed  a different  number  of  red  and  blue  disks.  The  subjects 
were  required  to  estimate  the  number  of  red  disks  pictured  In  each.  The 
five  slides  contained  8,  29,  50,  71,  and  92  red  disks  respectively.  The 
maximum  possible  number  of  red  disks  was  100.  The  subjects  were  asked  to 
Indicate  their  best  guess  about  the  exact  number  of  red  disks  pictured  In 
each  slide  by  placing  an  "X!  at  the  appropriate  point  on  a numberline  ranging 
from  0 to  100. 

Time.  The  final  set  of  stimuli  consisted  of  five  time  intervals:  6, 

18,  30,  42,  and  54  seconds.  The  subjects  were  asked  to  estimate  the  length 
of  time  that  elapsed  between  two  signals  given  by  the  experimenter.  To  prevent 
them  from  counting  or  using  some  other  method  to  record  the  passage  of  time, 
the  subjects  were  required  to  read  a long  series  of  three-letter  nonsense 
syllables  presented  individually  on  index  cards  during  each  interval.  The 
subjects  were  told  that  no  Interval  would  be  longer  than  60  seconds.  Again, 
they  were  asked  to  Indicate  their  best  guess  about  the  exact  length  of  each 
time  Interval  by  placing  an  "X"  at  the  appropriate  point  on  a numberline 
ranging  from  0 to  60  seconds. 
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Procedure 

The  experimenter  began  by  describing  the  purpose  of  the  study,  stating 
that  people's  accuracy  In  judging  various  characteristics  of  a wide  variety 
of  stimuli  was  being  Investigated.  The  fullness,  numeroclty,  and  time 
judgment  tasks  were  then  described.  When  the  subjects  Indicated  that  they 
understood  what  they  were  to  do,  they  were  presented  with  the  first  set  of 
stimuli.  For  each  set  of  stimuli  the  subjects  were  first  given  two  standards 
representing  the  maximum  and  minimum  possible  values  of  the  stimulus  class. 
Thus,  for  example,  before  making  the  fullness  judgment  the  subjects  were 
given  two  cartons  Identical  to  those  about  which  they  had  to  make  a judgment. 
One  of  these  cartons  was  completely  empty,  while  the  other  held  85  marbles, 
the  maximum  number  that  It  could  possibly  hold.  These  two  cartons  were 
clearly  labeled  with  the  number  of  marhles  they  held.  The  subjects  were 
encouraged  to  use  these  as  standards  of  comparison  when  making  their  estimates 
about  the  number  of  marbles  in  each  of  the  unknown  cartons.  Similarly,  for 
the  numeroclty  judgments  two  labeled  slides,  one  composed  of  100  red  disks  and 
the  other  composed  of  100  blue  disks,  were  presented  before  the  set  unknown 
slides  were  presented.  For  the  time  judgment  the  subjects  were  given  two 
Initial  practice  trials  lasting  for  60  seconds  each.  The  length  of  these 
two  practice  trials  was  clearly  stated  by  the  experimenter  both  before  and 
after  they  occurred.  Each  subject  made  the  fullness  judgments  first,  followed 
by  the  numeroclty  judgments,  and  then  the  time  judgments.  However,  the  five 
stimuli  within  each  judgment  type  were  presented  in  different  random  orders. 
Dependent  Measures 

Two  different  measures  of  uncertainty  about  the  accuracy  of  each  judg- 
ment were  obtained.  All  of  the  subjects  first  reported  their  best  guess 
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about  the  exact  value  of  each  stimulus.  Then,  half  of  the  subjects  were 
asked  to  construct  an  equivalence  interval  around  this  guess  in  the  manner 
described  above:  They  indicated  the  range  of  values,  both  above  and  below 
their  guess,  within  which  they  were  reasonably  certain  that  the  correct 
answer  would  lie  and  outside  of  which  they  were  reasonably  certain  that  the 
correct  answer  did  not  lie. 

The  remaining  half  of  the  subjects  were  asked  to  report  how  confident 
they  were  about  the  accuracy  of  each  judgment  by  placing  an  "X"  at  the 
appropriate  point  on  a separate  21 -point  bi -polar  adjective  scale  ranging 
from  "quite  confident"  to  "not  at  all  confident."  For  ease  of  comparison 
with  the  equivalence  interval  measure,  the  responses  to  the  confidence  ratings 
were  scored  so  that  "quite  confident"  received  a value  of  0 and  "not  at  all 
confident"  received  a value  of  20.  Scores  computed  in  this  way  thus  reflect 
the  subjects'  degree  of  "non-confidence." 

Results 

Judgment  Uncertainty 

Figure  1 shows  the  mean  uncertainty  ratings  for  each  judgment  using 
both  the  equivalence  interval  measure  and  the  non-confidence  measure. 

Repeated  measures  analyses  of  variance  were  computed  for  each  judgment  type 
using  the  equivalence  Interval  measure  and  the  non-confidence  measure 
separately  as  dependent  variables.  The  F-ratlos  from  these  analyses  are 
presented  in  Table  1. 

Insert  Figure  1 and  Table  1 about  here 

As  can  be  seen  In  Figure  1,  there  was  a high  degree  of  similarity  in  the 
overall  pattern  of  means  for  the  two  different  measures.  The  correlation 
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Equivalence  Interval  Measure 
Non-Confidence  Measure 


Figure  1.  Mean  uncertainty  ratings  using  the  equivalence  interval  measure  and  the  non-confidence  measure 


Table  1 

F-Ratios  from  the  Analyses  of  Variance  Using  the  Equivalence  Interval 
Measure  and  the  Non -Confidence  Measure 


Judgment 

Type 

F-Ratios 

Overal  la 

Linear*5 

Quadratic*5 

Equivalence  Interval  Measure 

Fullness 

20.25*** 

21.19*** 

57.70*** 

Numerocity 

29.73*** 

9.14** 

109.91*** 

Time 

31 . 28*** 

102.82***  - 

22.21*** 

Non-Confidence  Measure 

Fullness 

21.12*** 

3.73 

79.08*** 

Numerocity 

48.42*** 

.83 

191.88*** 

15.14*** 


34.13*** 


20.34*** 


-7- 


between  the  five  equivalence  interval  measure  means  and  the  five  non-confiderce 
measure  means  is  .72  for  the  fullness  judgments,  .97  for  the  numerocity  judg- 
ments, and  .92  for  the  time  judgments.  The  coefficients  are  all  highly 
2 

significant,  indicating  a substantial  overlap  in  the  variance  explained  by 
the  two  measures. 

The  overall  treatment  effect  for  each  judgment  type  was  highly  signifi- 
cant for  both  the  equivalence  interval  measures  and  the  non-confidence 
measures,  itore  important,  both  of  these  measures  demonstrated  the  predicted 
pattern  of  uncertainty  for  the  fullness  and  numerocity  judgments.  The  pattern 
of  means  for  both  judgment  types,  along  with  th£  highly  significant  quadratic 
component  in  each  analysis,  indicated  that  the  subjects  became  less  uncertain 
about  the  accuracy  of  their  fullness  and  numerocity  judgments  as  the  actual 
value  of  the  stimulus  approached  either  the  maximum  or  minimum  possible 
value.  As  the  actual  value  of  the  stimulus  approached  the  point  mid-way 
between  these  two  extremes,  the  subjects  became  increasingly  uncertain  about 
the  accuracy  of  their  judgments.  This  pattern  Is  somewhat  clearer  for  the 
non-confidence  measure  than  for  the  equivalence  interval  measure,  since  the 
analyses  using  the  latter  also  revealed  significant  linear  components.  These 
linear  trends  appear  to  be  due  to  the  asymmetry  of  the  effect.  The  subjects' 
degree  of  uncertainty  about  the  accuracy  of  their  fullness  and  numerocity 
judgments  did  not  decrease  as  much  when  the  actual  stimulus  value  approached 
the  maximum  possible  value  as  when  it  approached  the  minimum  possible  value. 

Unlike  the  uncertainty  measures  for  the  fullness  and  numerocity  judgments, 
the  uncertainty  measures  for  the  time  judgments  did  not  follow  the  predicted 
pattern.  Rather,  the  means  from  both  the  equivalence  interval  measure  and  the 
non-confidence  measure  for  the  time  judgments  are  best  described  by  a linear 
trend:  Subjects  became  more  and  more  uncertain  about  the  accuracy  of  their 
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tlme  estimates  as  the  length  of  the  stimulus  increased.  The  quadratic 
components  for  both  analyses  did  reach  significance,  but  these  seem  to  reflect 
primarily  a celling  effect  at  the  higher  stimulus  levels.  Up  to  a point,  as 
the  length  of  the  stimulus  Increased  so  too  did  the  subjects'  uncertainty. 
Beyond  this  point,  however,  further  Increases  In  the  length  of  the  stimulus 
did  not  lead  to  Increased  uncertainty. 

Judgment  Accuracy 

It  Is  of  further  Interest  to  examine  the  accuracy  of  the  subjects'  best 
guesses  about  the  exact  value  of  each  stimulus.  This  can  be  done  by  computing 
the  absolute  difference  between  each  subject's  guess  and  the  actual  stimulus 
value,  resulting  In  a judgment  error  score.  The  mean  error  score  for  each 
judgment  is  presented  in  Table  2.  Separate  repeated  measures  analyses  of 
variance  were  computed  for  each  of  the  three  judgment  types.  The  F-ratios 
from  these  analyses  also  are  presented  in  Table  2. 

Insert  Table  2 about  here 

The  overall  treatment  effect  for  each  judgment  type  was  highly  signifi- 
cant. The  large  quadratic  components  of  both  the  fullness  and  numerocity 
analyses  suggest  that  the  subjects  became  much  more  accurate  In  making  these 
judgments  when  the  actual  value  of  the  stimulus  approached  either  the  maximum 
or  minimum  possible  value.  As  the  actual  value  of  the  stimulus  approached  the 
point  mid-way  between  these  two  extremes  the  subjects'  judgments  tended  to 
become  less  accurate.  This  pattern  Is  somewhat  stronger  for  the  numerocity 
judgments  than  for  the  fullness  judgments,  as  evidenced  by  the  significant 
linear  trend  In  the  latter.  When  the  actual  stimulus  value  approached  the 
maximum  possible  value  for  the  fullness  judgment  the  subjects'  accuracy  did 
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not  Improve  quite  as  much  as  when  the  actual  stimulus  value  approached  the 
minimum  possible  value.  It  is  this  asymmetry  which  apparently  led  to  the 
significant  linear  trend.  Finally,  while  both  the  linear  and  quadratic 
components  of  the  time  analysis  reached  significance,  the  linear  component 
was  clearly  much  stronger.  In  general,  the  subjects'  time  estimates  became 
less  accurate  as  the  length  of  the  stimulus  increased. 

Equivalence  Interval  Effectiveness 

It  Is  possible  to  determine  how  often  those  subjects  who  constructed  an 
equivalence  interval  around  each  judgment  actually  enclosed  the  correct  stimu- 
lus value,  and  whether  this  varied  according  to  stimulus  level.  This  can  be 
done  by  assigning  the  subjects  a 0 each  time  their  equivalence  interval 
enclosed  the  correct  value,  and  a 1 each  time  it  did  not.  The  mean  effective- 
ness score  for  each  judgment  is  reported  In  Table  3.  Separate  repeated 
measures  analyses  of  variance  were  computed  for  each  judgment  type.  The  F- 
ratios  from  these  analyses  also  are  presented  In  Table  3. 

Insert  Table  3 about  here 

As  can  be  seen,  the  subjects  generally  constructed  Intervals  that  were 
too  narrow.  Averaging  over  all  fifteen  judgments,  they  failed  to  enclose 
the  correct  value  nearly  42%  of  the  time.  More  importantly,  this  failure  to 
enclose  the  correct  stimulus  value  varied  systematically  across  stimulus 
levels.  The  significant  quadratic  component  of  both  the  fullness  and  numer- 
oclty  analyses  Indicates  that  for  these  two  judgment  types  the  subjects  were 
more  likely  to  enclose  the  correct  stimulus  value  when  the  actual  stimulus 
value  approached  either  the  maximum  or  minimum  possible  value.  As  the  actual 
value  of  the  stimulus  approached  the  point  mid-way  between  these  two  extremes. 
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the  subjects  became  increasingly  less  likely  to  enclose  the  correct  value. 
Again,  since  the  linear  component  of  the  fullness  analysis  also  reached 
significance,  this  pattern  is  not  quite  as  strong  as  It  is  for  the  numerocity 
judgments.  When  the  actual  stimulus  value  approached  the  maximum  possible 
value  for  the  fullness  judgment  the  subjects'  ability  to  enclose  the  correct 
value  did  not  improve  as  much  as  when  the  actual  stimulus  value  approached 
the  minimum  possible  value.  Finally,  the  subjects'  ability  to  enclose  the 
correct  stimulus  value  for  the  time  judgments  showed  a slight,  though  signifi- 
cant, tendency  to  decrease  as  the  length  of  the  stimulus  increased.  As  the 
stimuli  became  longer  the  subjects  were  less  likely  to  enclose  the  correct 
value  within  the  equivalence  Interval. 

Discussion 

As  expected,  the  width  of  the  subjects’  equivalence  intervals  were 
highly  correlated  with  their  reported  confidence  in  the  accuracy  of  each 
judgment.  As  the  subjects'  confidence  In  the  correctness  of  their  answers 
decreased,  the  range  of  answers  that  they  thought  might  reasonably  be 
correct  increased.  It  thus  seems  justifiable  to  conclude  that  the  equivalence 
interval  is  Indeed  an  alternative  measure  of  uncertainty. 

The  findings  from  the  present  stucty  also  provide  support  for  the  hypo- 
thesis that  Individuals  will  In  general  be  more  uncertain  about  their  judg- 
ments the  further  the  actual  stimulus  value  Is  from  a clearly  defined  standard 
of  comparison,  or  anchor  point.  The  results  based  on  the  fullness  and 
numerocity  judgments  are  consistent  with  this  prediction.  As  the  actual 
stimulus  value  approached  either  the  maximum  or  minimum  possible  values  the 
subjects  became  less  and  less  uncertain  about  the  accuracy  of  their  judgments. 
The  results  based  on  the  time  judgments,  on  the  other  hand,  follow  a different 
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pattcrn.  The  subjects  did  seem  to  become  less  uncertain  about  the  accuracy 
of  their  time  judgments  as  the  actual  length  of  the  time  Interval  approached 
the  minimum  possible  value.  However,  their  uncertainty  apparently  did  not 
decrease  when  the  length  of  the  time  Interval  approached  the  maximum  possible 
value.  Rather,  It  tended  to  remain  at  a relatively  high  level. 

The  overall  pattern  of  results,  while  not  completely  as  predicted,  can 
nevertheless  be  explained  In  terms  of  the  original  uncertainty  hypothesis  If 
It  Is  assumed  that  some  of  the  established  maxima  and  minima  did  not  provide 
very  clear  standards  of  comparison.  For  example.  In  retrospect  It  seems 

1 

quite  unlikely  that  the  arbitrary  60  second  maximum  placed  on  the  time  Inter- 
vals provided  the  subjects  with  a very  good  standard  of  comparison.  Even 
though  they  were  given  two  practice  trials  to  help  establish  the  60  second 
interval  as  an  anchor  point  at  the  upper  end  of  the  scale,  the  subjects  were 
probably  so  unfamiliar  with  the  exact  duration  of  various  time  Intervals  In 
everyday  life  that  this  procedure  had  relatively  little  Impact.  Therefore, 
while  0 seconds  did  provide  a clear  anchor  point  for  making  comparisons,  60 
seconds  did  not.  If  this  Is  the  case  then  It  Is  not  unreasonable  for  the 
subjects  to  be  just  as  uncertain  about  the  accuracy  of  their  time  judgments 
at  the  very  high  end  of  the  continuum  as  they  were  at  Intermediate  levels: 

In  neither  case  did  they  have  a satisfactory  standard  of  comparison  for 
making  their  judgments. 

The  subjects'  uncertainty  about  the  accuracy  of  their  judgments  at  the 
various  stimulus  levels  closely  paralleled  their  actual  degree  of  accuracy  at 

those  levels.  Uhen  they  were  more  uncertain  about  the  accuracy  of  their 

' 

\ judgments,  those  judgments  were  In  fact  more  Inaccurate.  Interestingly, 

i 

however,  those  who  were  given  the  opportunity  to  construct  equivalence 
Intervals  around  their  best  guesses  were  not  able  to  effectively  compensate 
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for  their  inaccuracy  in  terms  of  being  able  to  enclose  the  correct  value 
within  the  interval.  Even  though  their  equivalence  Intervals  Increased  In 
size  when  their  best  guesses  were  most  likely  to  be  wrong,  they  still  failed 
to  enclose  the  correct  value  as  much  as  60%  of  the  time  in  some  cases.  These 
findings  suggest  that  the  subjects'  level  of  uncertainty  does  not  perfectly 
map  onto  their  objective  probability  of  being  correct.  In  general,  the 
subjects'  equivalence  Intervals  seem  to  indicate  that  they  are  more  confident 
in  the  accuracy  of  their  judgments  than  they  really  should  be.  This  seems  to 
be  particularly  true  at  the  mid-points  of  the  judgment  scale,  when  a standard 
of  comparison  Is  not  readily  available.  Similar  results  have  been  obtained 
by  Lichtenstein,  Fischhoff  and  Philips  (1977). 

These  findings  raise  an  important  question.  In  a sense,  the  equivalence 
interval  is  the  phenomenological  counterpart  of  the  statistical  concept  of  a 
confidence  interval.  Yet  It  says  little  about  the  phenomenological  level  of 
confidence  at  which  the  subjects  are  operating.  Is  it  the  95%  level?  The  60% 
level?  The  40%  level?  Or  does  the  level  of  confidence  vary  across  Individuals 
and  situations?  In  order  to  fully  understand  the  relationship  between  uncer- 
tainty and  decision  making  behavior  this  question  needs  to  be  answered. 

Overall,  the  equivalence  Interval  technique  seems  to  be  a useful  way  to 
measure  uncertainty  and  appears  to  have  several  advantages  over  other 
possible  measures.  First,  It  Is  potentially  applicable  in  a wide  variety  of 
situations.  Although  the  present  study  was  concerned  only  with  uncertainty 
about  judgments  of  physical  and  temporal  characteristics,  the  equivalence 
Interval  technique  should  work  equally  well  for  any  quantitative  dimension, 
such  as  uncertainty  about  production  costs,  net  earnings,  and  Industry  vola- 
tility* Second,  It  should  be  possible  to  make  specific  predictions  about 
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behavior  by  observing  whether  a critical  stimulus  value  lies  Inside  or  out- 
side the  equivalence  interval.  Production  foremen,  for  example,  should  be 
much  more  likely  to  work  toward  a 15%  production  increase  If  this  value  lies 
within  what  they  perceive  as  a reasonable  range  of  possibilities.  Finally, 
the  equivalence  interval  technique  provides  a vehicle  that  can  be  used  to 
further  explore  both  the  nature  of  uncertainty  and  its  impact  on  behavior. 
Some  work  is  already  being  done,  for  example,  to  investigate  how  peoples' 
uncertainty  about  various  elemental  aspects  of  the  decision  environment 
contribute  to  their  overall  decision  uncertainty  (e.g.  Johnson,  flote  2). 

The  equivalence  interval  technique  should  prove  to  be  quite  useful  in  this 
regard. 
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With  three  degrees  of  freedom  only  the  correlation  coefficients  for  the 
numerocity  and  time  judgments  are  significant  at  the  .05  level.  This  test  of 
significance  Is  too  conservative,  however,  since  means  are  being  correlated 
instead  of  individual  scores.  The  means  are  less  influenced  by  measurement 
error  and  are  thus  more  stable  than  are  individual  scores.  A more  appropriate 
test  might  be  to  use  28  degrees  of  freedom,  based  on  the  total  number  of 
subjects  contributing  to  each  mean.  Such  a test  suggests  that  all  three 
coefficients  are  highly  significant,  £ < .001. 
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